A Survey of Classification Methods in Data Streams
نویسندگان
چکیده
With the advance in both hardware and software technologies, automated data generation and storage has become faster than ever. Such data is referred to as data streams. Streaming data is ubiquitous today and it is often a challenging task to store, analyze and visualize such rapid large volumes of data. Most conventional data mining techniques have to be adapted to run in a streaming environment, because of the underlying resource constraints in terms of memory and running time. Furthermore, the data stream may often show concept drift, because of which adaptation of conventional algorithms becomes more challenging. One such important conventional data mining problem is that of classification. In the classification problem, we attempt to model the class variable on the basis of one or more feature variables. While this problem has been extensively studied from a conventional mining perspective, it is a much more challenging problem in the data stream domain. In this chapter, we will re-visit the problem of classification from the data stream perspective. The techniques for this problem need to be thoroughly re-designed to address the issue of resource constraints and concept drift. This chapter reviews the state-of-the-art techniques in the literature along with their corresponding advantages and disadvantages.
منابع مشابه
Classification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملComputer Simulation of Particle Size Classification in Air Separators
Cement powder size classification efficiency significantly affects quality of final product and extent of energy consumption in clinker grinding circuits. Static and dynamic or high efficiency air separators are being used widely in closed circuit with multi-compartment tube ball mills, High Pressure Grinding Rolls (HPGR) and more recently Vertical Roller Mills (VRM) units in cement plants ...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملAlgorithmic Techniques for Processing Data Streams
We give a survey at some algorithmic techniques for processing data streams. After covering the basic methods of sampling and sketching, we present more evolved procedures that resort on those basic ones. In particular, we examine algorithmic schemes for similarity mining, the concept of group testing, and techniques for clustering and summarizing data streams. 1998 ACM Subject Classification F...
متن کاملSurvey on Relationship between Morphometric Characteristics of Gullies with Vegetation Distribution (Case Study: Lamerd, Fars Province)
Gully erosion is one of the most significant erosion types. This type of erosion is one of the most important sources of sediment in different regions of the world. Gullies often have different dimensions and complex characteristics and these characteristics may affect the distribution of vegetation. Topographic characteristics of gullies provide complex ecosystem for vegetation establishment. ...
متن کاملA Survey on Classification Algorithm for Real Time Data Streams using Ensembled Approach
-Classification and analysis of data streams are the most promising fields of research and development in Data stream mining. Ensemble based classification approach is one the most challenging flavor of developing an efficient classifier due to large number available base classifiers and increase in the computational time required for training and classification. This research emphasizes on dev...
متن کامل